fix(connection-manager): add per-address timeout to prevent slow addr… by paschal533 · Pull Request #3412 · libp2p/js-libp2p

paschal533 · 2026-03-16T14:27:13Z

Problem

When dialing a peer with multiple multiaddrs (e.g. [/ip4/10.2.0.2/tcp/9106, /ip4/127.0.0.1/tcp/9106]), the dialTimeout (default 10s) was shared across all address attempts sequentially. If the first address
hung, for example, a private IP in a different subnet whose TCP SYN is silently dropped, it consumed the entire timeout budget and subsequent reachable addresses were never tried.

Root cause

In dial-queue.ts, the batch-level signal (AbortSignal.timeout(dialTimeout)) was passed directly to every individual transportManager.dial() call. One slow address = all addresses fail.

Fix

Add addressDialTimeout (default 6_000 ms) to ConnectionManagerConfig. For each address attempt, a per-address signal is composed from the outer batch signal and a fresh
AbortSignal.timeout(addressDialTimeout):

const addressSignal = anySignal([signal, AbortSignal.timeout(this.addressDialTimeout)])

If the per-address timeout fires first → error is caught, dial continues to the next address
If the batch timeout fires first → signal.aborted is true → whole operation aborts as before
.clear() is called in finally to prevent listener leaks

The addressDialTimeout is configurable via connectionManager.addressDialTimeout for users on slower transports (WebRTC, satellite links) who need more time per address.

Tests

Added a describe('addressDialTimeout') suite with 5 tests:

Per-address timeout fires → next address is tried (verified by call tracking, not just timing)
Exact bug-report scenario: 3 hung addresses then 1 reachable → all 4 tried in order, completes in ~3×addressDialTimeout not dialTimeout
All addresses hit per-address timeout → AggregateError (not TimeoutError)
Batch timeout fires first → name === 'TimeoutError'
Fast addresses are unaffected (regression guard)

Closes #2368

…esses exhausting the dial budget

dozyio · 2026-03-28T14:23:55Z

Thanks for this @paschal533

Kind of thinking parallel dial with limits (2 or 3 concurrent)... Maybe dial best address first via AddressSorter then after 250ms race the rest?

achingbrain · 2026-03-30T13:39:24Z

then after 250ms race the rest?

If you mean dial them in parallel I would be careful here as this will consume a lot of resources, particularly if the node is doing, for example, DHT things where you expect most dials to fail.

This opens the node up to an amplification-style attack where a peer might have a peer record with lots of bogus or slow addresses so a single dial request causes the target node make many dials, all of which are doomed.

The current behaviour of dialling addresses sequentially was introduced partly because of the above, but also due to empirical observations that if the first address dialled is a "good" candidate (e.g. public, non-NATed, or supports hole punching) and it fails, all other addresses will likely fail too.

That aside, having a shorter per-address timeout signal in addition to an overall dial timeout seems like a reasonable approach.

dozyio · 2026-03-31T18:51:55Z

Thanks for the context @achingbrain

tabcat · 2026-04-01T10:44:23Z

+/**
+ * @see https://libp2p.github.io/js-libp2p/interfaces/index._internal_.ConnectionManagerConfig.html#addressDialTimeout
+ */
+export const ADDRESS_DIAL_TIMEOUT = 6_000


This default looks reasonable.

In the future wondering if it may be beneficial to add a smaller timeout at the raw connection layer. The intention would be to quickly timeout (~4 seconds) when the destination host is offline.

@see

- Fix JSDoc @see URL to point to public ConnectionManagerInit interface instead of internal ConnectionManagerConfig - Replace timing-based assertion in the sequential-hang test with a signal-based one: listen for abort on each hung dial's options.signal and count aborts. Verifies the per-address AbortSignal was actually fired three times rather than measuring wall-clock elapsed time, which is flaky on loaded CI runners.

paschal533 · 2026-04-01T11:42:51Z

hey @tabcat, made both changes... fixed the JSDoc URL to point to ConnectionManagerInit instead of the internal. for the timing test, replaced the elapsed assertions with a signal-based approach and added an abort listener inside callsFake that increments dialsAborted, then assert expect(dialsAborted).to.equal(3). directly verifies the per-address signal actually fired for the 3 hung dials instead of relying on wall clock. also tested it in a real environment with 2 blackhole TCP servers (accept but never send data so the noise handshake hangs) + 1 real listener, connected in ~1200ms which is exactly 2 × addressDialTimeout, not the 10s batch timeout. works as expected

paschal533 added 3 commits March 16, 2026 15:21

fix(connection-manager): add per-address timeout to prevent slow addr…

2382154

…esses exhausting the dial budget

fix lint: add comment to empty catch block and rename Promise parameter

4554a40

fix(connection-manager): increase addressDialTimeout default to 6s

2084ec0

paschal533 marked this pull request as ready for review March 16, 2026 15:51

paschal533 requested a review from a team as a code owner March 16, 2026 15:51

tabcat reviewed Mar 19, 2026

View reviewed changes

Comment thread packages/libp2p/src/connection-manager/constants.defaults.ts Outdated

tabcat reviewed Mar 19, 2026

View reviewed changes

Comment thread packages/libp2p/src/connection-manager/dial-queue.ts Outdated

Comment thread packages/libp2p/src/connection-manager/index.ts Outdated

tabcat requested a review from dozyio March 19, 2026 11:01

paschal533 added 2 commits March 19, 2026 14:06

fix(connection-manager): correct @default jsdoc for addressDialTimeout

718cb67

revert: remove comment added to isDialable catch block

cc1c321

tabcat requested changes Apr 1, 2026

View reviewed changes

Comment thread packages/libp2p/src/connection-manager/constants.defaults.ts Outdated

Comment thread packages/libp2p/test/connection-manager/dial-queue.spec.ts Outdated

tabcat reviewed Apr 1, 2026

View reviewed changes

paschal533 force-pushed the fix/dial-queue-per-address-timeout branch from ed86ac1 to cc1c321 Compare April 1, 2026 11:36

tabcat approved these changes Apr 2, 2026

View reviewed changes

Merge branch 'main' into fix/dial-queue-per-address-timeout

d211965

dozyio approved these changes Apr 11, 2026

View reviewed changes

dozyio merged commit 2151144 into libp2p:main Apr 11, 2026
48 of 49 checks passed

tabcat mentioned this pull request Apr 12, 2026

chore: release main #3447

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(connection-manager): add per-address timeout to prevent slow addr…#3412

fix(connection-manager): add per-address timeout to prevent slow addr…#3412
dozyio merged 7 commits intolibp2p:mainfrom
paschal533:fix/dial-queue-per-address-timeout

paschal533 commented Mar 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dozyio commented Mar 28, 2026

Uh oh!

achingbrain commented Mar 30, 2026

Uh oh!

dozyio commented Mar 31, 2026

Uh oh!

Uh oh!

Uh oh!

tabcat Apr 1, 2026 •

edited

Loading

Uh oh!

paschal533 commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

paschal533 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dozyio commented Mar 28, 2026

Uh oh!

achingbrain commented Mar 30, 2026

Uh oh!

dozyio commented Mar 31, 2026

Uh oh!

Uh oh!

Uh oh!

tabcat Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paschal533 commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

paschal533 commented Mar 16, 2026 •

edited

Loading

tabcat Apr 1, 2026 •

edited

Loading